05. Feature Extraction I

heading

Feature Extraction

ND320 C4 L3 05 Feature Creation

Feature Creation Summary

In the previous exercise we thought about how to distinguish between the types of activities based on the data but because of the limited set of data it’s important to recognize that we’ve also just overfit ourselves to the dataset. When we did the data exploration, we looked at the entire dataset and now our brains might pick up on patterns that are only specific for that dataset. And it almost happened to me and we looked at how we might have distinguished biking from walking but there was 1 subject's walking data that looked much more like biking data rather than walking. It is safer to use literature based features, particularly when on a limited dataset. Other researchers use different datasets so it is very unlikely to be overfit to your dataset.

ND320 C4 L3 06 Feature Extraction Walkthrough

Feature Creation Summary

Summary

With the previous exercise, we’ve started to think about the data and how we might build features to separate the signals into activity classes. And while that’s a great exercise, it’s important to recognize that we’ve also just overfit ourselves to the dataset. When we did the data exploration we looked at the entire dataset and now our brains might pick up on patterns that are only specific for that dataset. This almost happened to me, in fact. One of my observations was that the x and z channel overlap for walking and not for biking. However, this isn’t true for S6. There could easily have been a dataset where S6 did not participate, and I would have fully believed that x and z acc channels overlap when people are walking. If I had put this feature in my model, it would have not generalized to S6 and the performance of my model would have significantly degraded.

For small datasets like this, looking at the literature of existing features is a great way to avoid overfitting because the researchers who have come up with these features have looked at different datasets from ours. We are going to use features described in:

  • Mehrang S., Pietilä J., Korhonen I. An Activity Recognition Framework Deploying the Random Forest Classifier and A Single Optical Heart Rate Monitoring and Triaxial Accelerometer Wrist-Band. Sensors. 2018;18:613. doi: 10.3390/s18020613. Link
  • Liu S, Gao RX, Freedson PS. Computational methods for estimating energy expenditure in human physical activities. Med Sci Sports Exerc. 2012;44:2138–2146. doi: 10.1249/MSS.0b013e31825e825a. Link

We describe some of these features in an IPython notebook and leave some of the implementation to you in the following exercise!

Notebook Review

If you wanted to interact with the notebook in the video, you can access it here in the repo /activity-classifier/walkthroughs/feature-extraction/. The dataset that will be used throughout this lesson can be found at the top of the lesson directory at /activity-classifier/data/. No workspace is available as the following exercise's starter code is what you saw in this concept.

Exercise

Exercise 2: Feature Extraction

Instructions

  1. Complete the Offline or Online instructions below.
  2. Read through the whole .ipynb.
  3. Complete all the code cells that contain ## Your Code Goes Here.

Offline

  1. In the repo which you can access here in the repo /activity-classifier/exercises/2-feature-extraction) you should find the following files:
  • 2-feature-extraction.ipynb
  1. The dataset that will be used throughout this lesson can be found at the top of the lesson directory at /activity-classifier/data/.
  2. Open up the python notebook and associated files in your desired editor.

Note: Instructions can be found in Introduction to Wearable Data's Concept Developer Workflow for how to set up your local environment.

Online

  1. Go to the next concept and the 2_feature_extraction.ipynb should be open and the workspace should already contain the appropriate data folder.